Uj in 




i /? /j 



PARALLEL PROCESSORS AND NONLINEAR STRUCTURAL DYNAMICS ALGORITHMS 

AND SOFTWARE 

A ' ~ £ */-2 7^7 


Principal Investigator: Ted Belytscnko 

Final Technical Report 
March 1, 1986 to October 30, 1986 

Department of Civil Engineering 
Northwestern University 
Evanston, Illinois 60201 

NASA Research Grant NAG-1-650 

(N AS A -CR- 1798 89) PAR A1LEL PROCESSORS AND N87-11511 

NCNLINEAR STRUCTURAL DYNAMICS AIGORIIHMS AND 
CETiARE Final Technical Report, 1 Mar. - 

0 Oct. 1S86 (Northwestern Univ.) 35 p Unclas 

CSCL 09B G3/61 43965 





PREFACE 


This research was conducted under the direction of Professor Ted 
Belytschko. Participating research assistants were Noreen Gilbertsen and 
Bruce Engelmann. The help of Argonne National Laboratory, particularly Dr. 
James Kennedy, who provided access to several parallel computing machines is 


also appreciated. 

The following paper supported by NASA was accepted for publication: 


P. Smol inski and T. Belytschko, "Multi -Time Step Integration Using Nodal 
Partitions," accepted for publication. International Journal for Numerical 
Methods in Engineering. 



ABSTRACT 


A nonlinear structural dynamics program with an element library that 
exploits parallel processing is under development. The aim is to exploit 
scheduling-allocation so that parallel processing and vectorization can 
effectively be treated in a general purpose program. As a byproduct an 
automatic scheme for assigning time steps was devised. A rudimentary form of 
the program is complete and has been tested: it shows substantial advantage 
can be taken of parallelism. 

In addition, a stability proof for the subcycling algorithm has been 


developed 


1. INTRODUCTION 


The purpose of this project is to develop a nonlinear structural dynamics 
computer program with a library of elements which can effectively exploit a 
computer with concurrent processing capabilities and vectorization. The 
program will be developed in a form that is readily adaptable to the NICE test 
bed of NASA Langley. 

While substantial study has already been devoted to the treatment of 
large systems of partial differential equations (POE) on MIMD (Multiple 
Instruction - Multiple Data Stream Processors), see Ref. [1] the problems 
posed in the parallelization of general purpose of finite element computer 
programs are substantially different. In large PDE systems, the computations 
associated with each node of a mesh are usually in essense quite similar and 
most of the concern lies with the spatial partitioning of the mesh so that 
optimum utilization of the parallel character of the computer is achieved. 

On the other hand, in a general purpose finite element program, a major 
difficulty in parallelization is associated with the large variety of elements 
and constitutive models which are used in a computation. Thus, the most 
effective utilization of concurrent processing will usually not be that based 
on a simple subdivision of the spatial domain, since vectorization of a stream 
of dissimilar elements is almost impossible. The problem is further 
compounded by the practical necessity of post-processing the solution for 
graphic- display purposes in parallel with the computation. The practice of 
simply dumping all historical variables such as displacements, strains and 
stresses at all points into a storage device and then processing it subsequent 
to the calculation is usually unbelievably wasteful in a nonlinear structural 
dynamics computation because of the large amounts of time required for the 
communication and the large amounts of storage required. 
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Thus it can be seen that in a general purpose programs for nonlinear 
structural dynamics, the assignment - scheduling problem is quite crucial. 
Obviously, if one processor tends to lag behind the others, the benefits of a 
parallel architecture are quickly dissipated. As an example of the type of 
scheduling problem that arises in a typical nonlinear structural dynamics 
calculation, consider the nonlinear response of an elastic-plastic structure 
to an impulsive load. Initially the behavior of most of the structure will be 
elastic. However, as the deformation progresses, plastic response develops in 
some elements which will slow down the processing substantially. An efficient 
algorithm should be able to handle these changes in an effective manner 
without degrading performance. 

In this work, an explicit method has been chosen for the time 
integration. The architecture of an explicit time integration program is such 
that it can readily be expanded to an implicit time integration program based 
on an iterative solver such as the conjugate gradient method. This class of 
algorithms is most readily adapted to concurrent processing machines because 
the bulk of the computation time is devoted to element level operations, 
namely, obtaining element nodal forces from element nodal displacements via 
the strain-displacement equations, the constitutive law, and the volume 
integration of the semi-discrete divergence of the stress tensor, that is, the 
computation of the internal nodal forces. 

An effective implementation of an explicit time integration scheme in 
nonlinear structural dynamics requires that different time steps be used on 
different parts of the mesh. Otherwise, the presence of a few stiff elements 
in the model will entail the use of a small time step for the entire mesh. 
Therefore, the explicit time integration procedure in this project was 
designed so that different time steps could be used on different parts of the 



f 
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mesh. Furthermore, in conjunction with the vectorization and parallelli- 
zation, the program is being designed to automatically assign time steps to 
different parts of the mesh. Previous versions of time step partitions always 
required the user to select the domains which would use different time steps 
and for the partition to remain fixed throughout the computation. 

Although we had originally planned to implement our algorithms on a 
partitioned memory machine, initial planning indicated it would be better to 
take one step at a time and implement the algorithm in a shared memory 
machine. For this purpose, two computers which are available at Argonne 
National Laboratory were chosen: the Sequent Balance 8000/21000 and the 
Alii ant. The implementation of parallel processing in these machines at the 
Argonne center was facilitated substantially by the availability of a macro 
library of monitors which enables programmers to write portable FORTRAN Code 
for multi -processors [2]. These monitors are very similar in content to the 
FORCE monitors developed by Jordan [3], so eventual translation of this 
program to the NICE test bed of NASA Langley should be readily accomplished. 

As a framework of the initial studies, the program WHAMS, which is 
partially described in Ref. [4] and [5] was chosen. This program contains a 
4-node quadrilateral shell element, a beam element, and a spring element so 
that the allocation problem among different elements could effectively be 
studied. It is a three dimensional program which treats both geometric and 
material nonlinearities. 

An outline of the report is as follows. Section 2 describes the time 
integration algorithm. Section 3 gives some sample numerical results, 
followed by some conclusions. A stability proof for a model problem relevant 
to the subcycling procedure is given in the Appendix. 


2. ALGORITHM DESCRIPTION 


Central Difference Method 

A finite element model of a nonlinear structure is governed by the 
following equations: 


strain-displacement equations (in rate-form) 



constitutive equations (in rate-form) 

i -J'd. o) 

momentum equations 



^£ext ” ^int^ 


( 1 ) 


( 2 ) 


(3) 


f in* a s ! B T T dfi 
~int ,e ^ ~ ~ 


(4) 


In the above the following nomenclature has been used. 


B ... 

o ... 


f. . 

~int 


-ext *** 
M ... 

T, t ... 

iV 

u, u, u , 


strain-rate-velocity matrix 

velocity strains (strain rates) 

internal nodal forces 

external nodal forces 

mass matrix (assumed diagonal and lumped) 

Cauchy stress matrix and its frame invariant rate 
nodal displacements, velocities and accelerations, 
respectively 
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Q ... the domain of element e 

e 

^/... the constitutive equation (or algorithm) 

The subscript e will always indicate element-related variables. 

The central difference method uses the following equations to update the 
nodal variables 

5 n* V 2 , V 2 * lt ™n (5) 

/V /V 

u n+1 = u" ♦ St i n+ V 2 (6) 

In the above, superscripts designate time steps. 

An outline of the algorithm for explicit time integration is given in 
Table 1. As can be seen from the flow chart, the major opportunity for paral- 
lelization appears in the loop over the elements. The number of multipli- 
cations per element can vary from 50 to the order of 10^. Here we have a 
coarse-grained parallelism which is ideal for concurrent processors. However, 
if the parallelism is exploited on an element level, then the opportunities 
for any significant vectorization are lost. To exploit vectorization in 
conjunction with parallelism, it is necessary to arrange the elements in 
groups. The number of groups should be greater than the number of processors, 
but the size of each group is limited also by the auxiliary arrays which are 
needed for vectorization. Furthermore, for an efficient implementation of 
vectorization, only one type of element can occur in a group. 
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Table 1 

* 

Outline of Explicit Algorithms 

1. Initial conditions: u " ^ u ° 

rsS fW 

2. Update nodal velocities and displacements by Eqs. (5) and (6) 

3. Loop over all NE elements 

Loop over all NG quadrature points in element 

Compute velocity strains by Eq. (1) 

Evaluate constitutive law, Eq. (2) 

Evaluate T and add to integrand of Eq. (4) 

end of loop over quadrature points 

add f. + „ to total f. . array 
~int ,e ~int 

end of loop over elements 

4. Compute accelerations by Eq. (3); go to 2 

Explicit-Explicit Partition 

The concurrent procedure descrubed here is based on an explicit-explicit 
partitioning procedure, or subcycling procedure, first presented in [9,10]. 

In this method, the elements are separated into element groups, each of which 
can be integrated with a different time step subject to the following 
restrictions : 

1. The largest step must be an integer multiple of all time steps. 

2. If any node is shared by elements in two different integration 
groups, the time steps in these groups must be integer multiples of 
each other. 

In adapting this method to a paral lei -vector code, it was decided to make 
the alignment between the element groupings for vectorization and the time 



steps partitioning coincident. Thus any element group is automatically 
integrated by the same time step. The element grouping is performed by a 
preprocessor. 

For the purpose of defining how the explicit-explicit partitioning 
procedure works, we will define the following variables. 

NTGRP: number of groups into which the finite element mesh is 

subdivided 

At gi the time increment for element group G 

t Mac+ : the master time 

I IW V 

At Mas t : the master Time increment, which corresponds to the 

minimum Atg among all element groups G 
At gt the time increment for element group G. 

As stated previously, all element time step increments Atg must be 

integer divisors of At u . The maximum At~ is called At 

Mast G max 

The program is designed so that it automatically decides the appropriate 
nodal time step. This is accomplished by using the largest time increment for 
any element group connected to the node to update the node. In order to 
program this algorithm, each node therefore requires two additional words of 
storage: the nodal time t^ and the time increment used for that node, At^. 

The essence of the procedure is as follows. We call the time steps 


necessary to advance the master clock by At„ v a cycle. Within a cycle, 

niaX 

whenever the master clock t Ma$t is incremented by At Mgst , all element groups 
are first checked. If any element group is not ahead of the master time, i.e. 


if for group G 


t G < t Mast 


that element group is updated. This updating involves the calculation of new 
velocity strains, stresses and internal forces. The element internal forces 
are then added into the global internal force matrix. 

After all element groups have been updated to time T^ ast » the nodal loop 
for updating velocities and displacements is executed. In this loop, prior to 
updating the velocities and displacements, the nodal clock t N is checked. If 

*N 4 t Mast 

the nodal clock is behind the master clock, so the node is updated. In 
addition, the nodal clock is updated using the time increment for that node. 

The algorithm assumes that a velocity strain formulation is used for all 
element calculations. When an element needs to be updated, the latest 
available velocity is used to compute the velocity strain. This means if an 
element is connected to a node with a larger time step, it uses the same nodal 
velocity for all intermediate time steps. This corresponds to a constant 
velocity interpolation or a linear displacement interpolation, which has been 
shown to be stable. If displacements are needed by an element at an inter- 
mediate time step, linear interpolation based on the last cycle displacement 
with the current velocity as a slope is used. 

A flow chart for the procedure is shown in Table 2. 


Table 2 


Flowchart of Partitioned Explicit Algorithm 
1. Set initial condition 

u° = u ( 0 ) , u' 1/2 = u(0) , n = 0 , n2 = 0 

/V ^ ' A/ /V ' ' 1 9 


•• * 

initial accelerations are assumed to vanish u° = 0. 
2. Initialize clocks and cycle counters 


t 

t 

t 


Mast 

G 

N 


= 0 
= 0 
= 0 


master time 

for all element groups G 
for all nodes N 


3. If n =0 set, nodal time (f * 0) increments At^, subcycle counter n2 


4. Update nodes with clocks behind the master clock, i.e. t N < 

a. DO N = 1 to NNODE 

b. if t N > t Mast , skip node N 

c. compute accelerations ujJ * M f 

d. update nodal velocities: u^ + ^ 2 = u|J“ ^ 2 + At^ uJJ 

e. update nodal displacements: u|J +1 = uJJ + At^ u[J + ^ 2 

f. update nodal clock: t N t^ + At^ 


n+1 


5. Compute internal forces f!j n £ 


a - zer °lfnt 
b - 1f *S > ‘Mast- 


loop over element G 
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c. compute velocity strains: D^ + ^ 2 = B uj^ + ^ 2 

d. compute velocity strains: DjjJ + ^ 2 = g u^ + ^ 

e. update stress: Tj +1 = Tjj + Atg t J + 1/2 

f. compute element internal forces: N + / B T dV 

g. if n2 = 1, compute stable time increment for element 

h. assemble f?*} M into f!?*i 

~int, N ~int 

6. Update element group clocks t Q and n2 

7. Compute f ext 

8. If n2 * n2 , , set new element group time increment At r 

max r b 


3. NUMERICAL RESULTS 


At the present time, a version of WHAMS has been developed which includes 
beam and plate elements and a rudimentary form of parallelization. However, 
the conversion of the subroutines to take full advantage of vectorization has 
not been completed nor has the programming of the subcycling been completed. 

We will report results for 2 problems which have been solved. The first 
is a spherical cap problem; the problem description and mesh are shown in 
Table 3 and Fig. 1, respectively. A total of 91 nodes and 75 elements were 
used for the one-quarter model. A uniform step load was applied over the 
cap. The response compouted here is shown in Fig. 2 where it is compared to 
results given in Ref. [6], 

The computer times for various computer and various degrees of 
concurrency are reported in Table 4. In addition to the concurrency which is 
incorporated in the program, solution times for concurrency as developed by 
the compiler are given. 

The second problem which has been solved is shown in Fig. 3. In this 
case, one and two processors were used for the solution. The speedup is going 
from 1 to 2 processors in this case is 1.63, which is 80% of the theoretical 
speedup. 


Table 3. 


Material Properties and Parameters for Spherical Cap Problem 


Radius 

r = 22.27 in 

Thickness 

t = 0.41 in 

Angle 

a = 26.67° 

Density 

p = 2.45 x 10- 4 lb-sec 2 /in 4 

Young's modulus 

E = 1.05 x 10 7 psi 

Poisson's ratio 

ii 

o 

• 

u> 

Yield stress 

<jy = 2.4 x 10 4 psi 

Plastic modulus 

Ep = 2.1 x 10^ psi 

Pressure load 

P = 600 psi 

Solution times 

Table 4 

for spherical cap problem 1000 time steps 


ALLIANT (1 processor) 310.9 sec 
ALLIANT (compiler assigned concurrency on 8 processors) 116.5 sec 
ALLIANT (8 processors; programmer designed concurrency) 65 sec 
VAX 11/780 901.8 sec 
IBM 3033 75 sec 




displacement 



Figure 2. Center Displacement of Spherical Cap for Elastic and 
Elastic-Plastic Materials. 


15 


7 * /9 
7G 

96c 


d- node, p/afe elements 

beam elements (s'h'flfenersj 


degrees- oP- freedom 


/ooo l/me s£/*s 



Figure 3. Stiffened Shell Problem. 
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CONCLUSIONS 

A simple algorithm has been developed which permits a partition of time 
steps, so that different processors can run different groups of elements with 
different time steps. In addition, the method has been devised so that it can 
take advantage of vectorization and is quite automatic. A rudimentary form of 
the program has been completed and tested. It does not yet include vector- 
ization and the complete subcycling algorithm. Several examples run on the 
Alii ant computer show that the resulting algorithm can take relatively good 
advantage of concurrent processing. 

In addition, the stability of these numerical procedures has been 
studied. A proof of stability has been developed for linear first-order 
systems when nodal partitions are used. 
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1. INTRODUCTION 

In many engineering models that are composed of non-uniform 
meshes oftentimes a group of small or stiff elements forces the 
integration of the remainder of the mesh with a time step much 
smaller that its stable time step. To alleviate this difficulty 
subcycling, which uses different time steps in different portions 
of the mesh, has been developed. This eliminates the need to 
update the entire mesh with the stable time step of the smallest 
element, and requires much less additional programming in 
comparison to implicit-explicit time integration [1-3 j. 

In contrast to implicit-explicit integration, which has been 
analyzed by a variety of methods [3-5], there has been little 
done in the stability analysis of subcycling. One reason is that 
the multiple time steps introduce special difficulties not found 
in implicit-explicit integration with its single time step. Early 
stability analyses have employed simplifying assumptions [6] or 
failed to show that a complete set of real eigenvalues existed 

[7], The first rigorous proof of stability is given for a 

« 

subcycling algorithm which uses the same integration rule in the 
different partitions [8]. This scheme is modified [9] to use 
different values of a in the ot-algorithm along with different 
time steps in the different subdomains. The integration schemes 
in [8-10] are based on an element partition while the algorithm 
in [7] is based on a nodal partition. One major difference 
between these two types of partitions is that the nodal partition 
results in unsymmetric amplification matrices which complicate 
the analysis. 
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In this paper a stability analysis of a subcycling method 
which uses different values of a and a nodal partition is 
presented. Two features of this method are that no unsymmetric 
systems need be solved and that a simple procedure is used to 
establish sufficient conditions for stability. The element 
eigenvalue inequality theorem is used to bound the critical time 
step in terms of element eigenvalues. 

2. MIXED TIME INTEGRATION PROCEDURE 

The matrix equation governing linear diffusion processes is 
Mu + Ku = s (2'.1) 

where using the nomenclature of heat conduction M is the 
capacitance matrix, and K is the conductance matrix. The vectors 
u and s represent the nodal values of the dependent variables and 
source, respectively. A superposed dot denotes a time derivative. 
The matrix M is assumed to be diagonal or lumped and is positive 
definite, and the matrix K is symmetric and positive semidef inite. 
The problem consists of finding a function u = u(t) which 
satisfies governing equation (2.1) and the initial condition 

u(t=0)=u° (2.2) 

for all time t, where u° is a given vector. 

To develop the integration algorithm the linear one step 
integrator is used, which is given by 

u n+1 = u n + ( 1 - a)At u n + aAt u n+ ^ (2.3) 

where u n = u(nAt) and At is the time step. If the parameter a is 
equal to zero, the integration is called explicit because u n+ l 
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can be computed from historical data. Explicit integration is 
only conditionally stable, thus restricting the time step. For a 
> 0, the integration is called implicit with no restriction on At 
if a > i. 

For the purpose of introducing the mixed time integration 
procedure, the vector u is partitioned as 


u = 




rows 


N 0 rows 


(2.4) 


where the nodal groups A and B are integrated with time steps mAt 
and At, respectively, and where m is an integer such that m > 1. 
Since with this method m + 1 updates are needed to advance the 
solution one cycle from t to t + mAt, a counter k, which is set 
to zero it the start of each integration cycle and increased by 
one after every update, is used to keep track of the updates. 

The integration cycle is developed by writing Eq. (2.3) as 

(2.5) 


u k+1 = u k ♦ ltW k G k * 4tW k i k+1 


\2 “ 


where 


»i ■ 


" 0 

0 


0 

0 

- 

- 

X 

- 

- 



W 2 * 



0 

u - 


0 

°b! . 


for k^m (2.6) 


« k i - 


- - 


r _ 

m( 1 ~ a A )I 0 

„k 

ma A I 0 


W 2 = 


0 0 


0 0 



- - 


for k=m (2.7) 
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Here I is the unit matrix, and the W matrices are partitioned 
similar to u. 

To begin the integration cycle, equation (2.6) is used in 

conjunction with Eq. (2.5) to subcycle the solution m times in 

partition B with a time step At. To finish the cycle, nodal 

partition A is updated with the large time step mAt using Eqs. 

(2.5) and (2.7). After this process is complete, the solution 

has been advanced by mAt in both partitions. The parameters 

and ot 0 determine the integration algorithm for each partition. 

# 

Considering the homogeneous case (s = 0), since this is 
sufficient to examine stability, equation ( 2 . 1 ) is solved for u 
and substituted into Eq. (2.5) to yield after some manipulation 
(M + AtW^K ) u ^ +1 = (M - AtW^K)u k (2.8) 

or 

?2“ k+1 - *i“ k (2 - 9 > 

where 

A* = M + AtW^K (2.10a) 

A* = M - AtW*K (2.10b) 

Using the definitions of W and the above equations, we find that 
for k =0 to m - 1 : 



0 


““b^BA 


* AtC, B?BB 


(2.11a) 



A k = 

.1 


«A 


’ At(1 - «B>5 bA 
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(2.11b) 


Mg - At(l - a 0 )K BB 


and similarly for k = m if it can be shown that 


’?A + mAta A^AA 


A k = 

,2 


mAta A«AB 


«B 


(2.12a) 


M. - mAt ( 1 - 


V«AA 


A k = 

.1 


-mAt(l a A^AB 


”B 


(2.12b) 


Note that even though the matrices A^ are not symmetric, no 
unsymmetric systems need be solved since the partition containing 
only the capacitance matrix is evaluated prior to solving for the 
remaining nodal partition. For more details about the 
implementation of this method see [7] in which a similar method 
is presented. 


3. STABILITY ANALYSIS 

To investigate the stability characteristics of the evolution 
equation (2.9) it is necessary to examine the associated 

generalized eigenvalue problem, which can be written as 

k k 

A* x = X A* X 


(3.1) 
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Considering first the subcycle update, equation (2.11) is 
substituted into the above equation and with some rearrangement 
gives 


0 

0 


M. 

0 



_ (1 - X) 

-A 


--BA 

-BB . 

2 " At ( 1 - a B + Ato B ) 

0 

«B. 


(3.2) 


By defining 
U = 


(1 ~ X) 


At(l - c* B + Ata 0 ) 


and partitioning x as 


x =< 


it follows that 


(3.3) 


(3.4) 


' 0 

0 


f > 

i 


"?A 

0 


f \ 

y 

.EbA 

Ebb. 

< 

■ 

z 

k 4 

► = U 

0 

Eb . 

< 

Z 

k 4 


(3.4) 


where the dimension of the problem is N. Writing out the above 
equations gives 

0 = u M A y (3.5a) 


K BA y + K nn 2 = 


BB : 


U Eb ? 


(3.5b) 


Due to the positive definiteness of M^, the first equation implies 
either u = 0 or y = 0. Assume that for i = 1 to Nq, y = 0, while 
for i = Ng + 1 to N, Hi = 0. We now consider 


CASE I y = 0 i = 1 to N b 

The second of the above equations gives 

?BB fi = U i ?B ?i 111 t0 N B <3 - 6) 


Here Ng is the dimension of this reduced problem, obtained by 
deleting rows and columns from the standard matrices. Note that 
Kgg and Mg are symmetric, so that Zj. may be orthogonalized with 
respect to Kgg and Mg. Also, since Kgg is a constrained version 
of K, Kgg is not singular and K“^ exists. Thus x^ spans R Nb and 
is given by 


x 


i 




i = 1 to N B 


(3.7) 


CASE II 


Ui = 0 


i = Ng + 1, N 


Then the second equation gives 


«BA ?i + «BB ?i = ! 


i = Ng + 1, N 


(3.8) 


or 

!i - ?BB 5 ba Hi i = n B + L ' N < 3 - 

A set of vectors, yi(i = Ng + 1, N) , can be choosen which span 
R Na and the corresponding is given by Eq. (3.9). So the 


eigenvectors can be written as 


/ ^ 


f y 


lL 


n 


< 

11 


> i = N B + 1 to N (3.10) 

*i 


“-BB -BA ?i 


V J 


J 



Since N = + Nq the set of vectors given by Eqs. (3.7) 

and (3.10) span R N . Thus, two possible sets of eigenvectors and 
eigenvalues are given by 



Mi ? 0 i = 1 to N b 


(3.11a) 



< \ 

?i 

< > 
"-BB -BA ^i 


M- = 0 i = N_ + 1 to N (3.11b) 

1 o 


where the bar designates the eigenvectors corresponding to p = 0. 
The vectors xi are orthogonal to each other and to x^ with respect 
to K. To show the first fact, x?Kxi is written as 



Hk 

K "1 


f \ 

o 

-AA 

-AB 



^BA 

-BB_ 

< 

?i 

** 



i 4 




i,j = 1 to », (3 . 12a) 

i / j 


= 6 


1 3 


(3.12b) 


since is orthogonal with respect to Kbb* The second condition 
is shown by considering 







/ ' 

■?AA 

5ab' 

i 

- 3 




-l 

_*BA 



"-BB-BA^j 
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i = 1 to Ng 
j = Ng+1 to N 


(3.13a) 


?i !^BA -j ~ Z ’ ^nn ^nn ^ni Y- 


:i "BB -BB "BA i j 


= 0 


(3.13b) 

(3.13c) 


Recalling that from Eq. (3.3) 




(1 - x L ) 


l at ( i - <*g + X ) 


i = 1 to N 


or 


X i = 


1 + At U i (a B - 1) 

(1 + AtagUj^) 


i = 1 to N 


(3.13) 


For the time being, we will assume that stability will be defined 
as l^il <. 1 or from the above equation 


-1 

for all 
to 


1 + At ii. (ot R - 1) 

< r-r-A — < 1 

1 + At 

i = 1 to N. However, if Ui > 0 then Eq. 


(3.14) 

(3.14) reduces 


At < 
Note that 


2 

Ui(l - 2a B ) 
if u 1 = 0 , Eq . 


(3.13) implies 


Xi = 1 


(3.15) 


(3.16) 


The analysis of the large time step update (k = m) proceeds 
by substituting Eq. (2.12) into Eq. (3.1) and rearranging to give 
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(3.17) 


where 


y = 


a - hlA 


mAt(l - + \'a ft ) 


(3.18) 


and where the prime designates quantities associated with this 
update. By arguments similar to those used in the previous update 
analysis, two possible sets of eigenvectors, which span R N , and 


their corresponding eigenvalues can be shown to be 



(3.19a) 


/ ^ 


f “I ^ 



xi 

> = < 


► , yj = 0 i = N ft +1 to N 

(3.19b) 
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where z|(i = + 1 to N) are choosen to span R N ® and 

y7(i = 1 to Nft) is calculated from 


K... y- 
_AA L 1 


= y. 


*?A li 


(3.20) 


Also, the set of vectors x7 can be shown to be orthogonal to each 
other and x7 with respect to K. 


From Eq. (3.18) we have that 


1 + mAt P^(a A - 1) 


■ 


(1 + mAtct A up 


i = 1 to N 


(3.21) 


Using the stability criterion j X £ | < 1 and assuming that u? > 0, 
Eq. (3.21) gives 


At < 


- m(l - 2a ft )uJ 


i = 1 to N 


(3.22) 


and Eq. (3.21) states that X£ = 1 for u? = 0. 

To analyze the updating procedure, the solution at any 
arbitrary time is expanded in terms of the eigenvectors as 


? * 8 iifi * Ulj 


sum on i, ] 


(3.23) 


Note that the above expan^son is valid for either update, however, 
the range of i and j and the actual vectors will vary depending 
on whether the subcycle or large time step update are considered. 

Substituting the above expression into Eq. (2.9) and taking 
into account Eq. (3.1), the updated solution can be written as 

d k+1 = S i X i x i + y jX j sum of i, j (3.24) 


since Xj = 1 for xj. Defining the norm 

E = d T Kd (3.25) 

and using the fact that vectors Xi are orthogonal to each other 
and to xj with respect to K, substituting Eq. (3.23) gives 


E k = B i B t x? KXj * y^ Kx m 


sum on i, j, l, m 


(3.26a) 


f 
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ek * 8 i u i * 5im 


Similarly, substituting Eq. (3.24) gives 


1c + 1 2 2 — T — 

E = B i X i u i + y j Y m Xj Kx m sum on r , 3 , m 


(3.27) 


If stability is defined as |AjJ < 1, as was previously assumed, 
then it is apparent from the above two equations that E k + 1 < E^ . 
Moreover, since this norm decreases or remains constant for each 
update, it and thus d, remain bounded for all time. 

In determining the critical time step the element eigenvalue 


theorem will be used which states that 


u < MAX u 
max — max 


for all e 


(3.28) 


where u max is the eigenvalue of an assemblage of elements subject 
to arbitrary constraints and u e is the maximum unconstrained 
eigenvalue for any of the elements [ 8 , 11, 12]. For the subcycle 
to satisfy the stability condition, the time step must be choosen 
according to Eq. (3.15) where Ui is given by Eq. (3.6). Equation 
( 3 . 6 ) represents a constrained version of the eigenvalue problem 
obtained by assembling all the elements which contain a node of 


partition B so that 


V < MAX u 
max — 


for all e e S. 


3.29) 


where Sg is comprised of elements which have a node in partition 
B. Using the above inequality Eq. (3.15) is satisfied if 


At < 


for all e e S, 


(1 - 2a_) u 
' B /V max 


(3.30) 


• » 
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Similar arguments can be used for the large time step update to 
bound from Eq. (3.20) as 

C i “ C for all e e S ft (3.31) 

The above inequality can be used in Eq. (3.22) to provide a bound 
for the critical time step in the form 


At < 


for all e e S, 


(3.32) 


■U ' 2a A> u max 


Equations (3.30) and (3.32) give a convenient and- easily calculated 


c. c ~ i e»4-cir\e anH f i mo e on rah in fnr hhp 

1_ W L ill LUL tuu^o X 11^ LUC v. XUIC ^ '-***'-» v- — ^ ^ ^ 


subcycling algorithm. 
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a RSTg aer 


A nonl inear structural dynamics program with an element library that 
exploits parallel processing is under development. The aim is to exploit 
scheduling-allocation so that parallel processing and vectorization can 
effectively be treated in a general purpose program. As a byproduct an 
automatic scheme for assigning time steps was devised. A rudimentary form of 
the program is complete and has been tested: it shows substantial advantage 


can be taken of parallelism. 


In addition, a stability proof for the 
developed. 


subcycling algorithm has been 


^yO'3/Q 


